Skip to content

[Graphite MQ] Draft PR GROUP:spec_6358ce (PRs 4148)#4150

Closed
graphite-app[bot] wants to merge 1 commit intomainfrom
gtmq_spec_6358ce_1770418811528-25784604-77b7-4ddc-8ab4-f59b3bc5c6b3
Closed

[Graphite MQ] Draft PR GROUP:spec_6358ce (PRs 4148)#4150
graphite-app[bot] wants to merge 1 commit intomainfrom
gtmq_spec_6358ce_1770418811528-25784604-77b7-4ddc-8ab4-f59b3bc5c6b3

Conversation

@graphite-app
Copy link
Contributor

@graphite-app graphite-app bot commented Feb 6, 2026

This draft PR was created by the Graphite merge queue.
Trunk will be fast forwarded to the HEAD of this PR when CI passes, and the original PRs will be closed.

The following PRs are included in this draft PR:

…g, corrupt history preventing wf pull (#4148)

# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
@railway-app
Copy link

railway-app bot commented Feb 6, 2026

This PR was not deployed automatically as @graphite-app[bot] does not have access to the Railway project.

In order to get automatic PR deploys, please add @graphite-app[bot] to your workspace on Railway.

@claude
Copy link

claude bot commented Feb 6, 2026

Code Review

This PR (merge queue for #4148) addresses three important bug fixes in the Pegboard and Gasoline packages. Overall, the changes look solid with good defensive programming practices. Here's my detailed feedback:

✅ Strengths

  1. Improved Error Handling in Gasoline KV: The change from ensure\! to graceful error logging + early return (return Ok(None)) is excellent. This prevents workflow pulls from failing catastrophically when encountering corrupt history data.

    • Lines: engine/packages/gasoline/src/db/kv/mod.rs:1635-1640, 1650-1658
  2. Fixed Slot Over-reporting: The conditional logic in clear_pending_allocation now correctly only marks allocated_serverless_slot = false when both conditions are met (was allocated AND successfully cleared). This prevents incorrect slot accounting.

    • Lines: engine/packages/pegboard/src/workflows/actor/runtime.rs:1135-1137
  3. Added Missing Metrics Signal: The Destroy signal is now properly sent to the metrics workflow when an actor is destroyed early, fixing the metrics workflow not stopping bug.

    • Lines: engine/packages/pegboard/src/workflows/actor/mod.rs:329-335
  4. Better Logging Level: Changed allocation failure from warn to debug at runtime.rs:519 - this is appropriate since it's an expected condition during normal operation.

  5. Improved Test Runner Configuration: The environment variable parsing is now more robust with explicit defaults and proper boolean parsing.

🔍 Issues & Suggestions

Critical: Type Inconsistency in TypeScript

// Line 19
const RIVET_RUNNER_TOTAL_SLOTS = parseInt(process.env.RIVET_RUNNER_TOTAL_SLOTS ?? "1");

Issue: parseInt can return NaN if the environment variable contains invalid data. This could cause runtime errors.

Recommendation: Add validation or use a safer parsing approach:

const RIVET_RUNNER_TOTAL_SLOTS = parseInt(process.env.RIVET_RUNNER_TOTAL_SLOTS ?? "1") || 1;
// OR
const slotsStr = process.env.RIVET_RUNNER_TOTAL_SLOTS ?? "1";
const RIVET_RUNNER_TOTAL_SLOTS = isNaN(parseInt(slotsStr)) ? 1 : parseInt(slotsStr);

Potential Issue: Return Value Type Change

// Line 1693 in gasoline/src/db/kv/mod.rs
Ok(Some(events_by_location))  // Changed from Ok(events_by_location)

Impact: This changes the return type to Option<...> and requires the caller to handle the new None case (lines 1710-1712).

Question: Have all callsites been updated to handle the Option? The pattern matching at 1710-1712 looks correct, but verify no other code paths call this function.

Minor: Inconsistent String Comparison

// Lines 22-24
const AUTOSTART_SERVER = (process.env.AUTOSTART_SERVER ?? "1") == "1";
const AUTOSTART_RUNNER = (process.env.AUTOSTART_RUNNER ?? "0") == "1";
const AUTOCONFIGURE_SERVERLESS = (process.env.AUTOCONFIGURE_SERVERLESS ?? "1") == "1";

Recommendation: Use strict equality (===) instead of loose equality (==) for better type safety:

const AUTOSTART_SERVER = (process.env.AUTOSTART_SERVER ?? "1") === "1";

Documentation: Comment Quality

The new comment at runtime.rs:1115-1117 is well-written and explains the logic clearly. However, consider adding a comment explaining why allocated_serverless_slot && cleared is the correct condition at line 1136.

🧪 Testing Considerations

  1. Corrupt History Recovery: Test that workflows can successfully pull their state after encountering corrupt indexed names/chunks
  2. Slot Accounting: Verify serverless slots are correctly incremented/decremented across the full actor lifecycle (allocation → sleep → destroy paths)
  3. Metrics Workflow: Confirm the metrics workflow properly receives the Destroy signal and terminates correctly
  4. Test Runner Config: Test with various invalid environment variables to ensure graceful handling

📊 Performance & Security

  • Performance: ✅ No performance concerns. Early returns on corrupt data prevent wasted processing.
  • Security: ✅ No security issues identified. The changes are defensive and reduce failure scenarios.

📝 Style Compliance

  • ✅ Follows Rust formatting conventions (hard tabs)
  • ✅ Logging follows structured logging patterns (tracing::error\!(?wf, ...))
  • ✅ Error messages are lowercase as per CLAUDE.md guidelines
  • ⚠️ TypeScript could use strict equality operators

🎯 Recommendations

Before Merge:

  1. Fix the parseInt NaN handling in test-runner
  2. Use strict equality (===) for boolean env var comparisons
  3. Verify all callsites handle the new Option return type from the transaction closure

After Merge:

  1. Add integration tests for the corrupt history recovery path
  2. Add tests for the slot accounting edge cases

Summary

This is a solid bug fix PR that addresses real production issues with good defensive programming. The main concern is the parseInt NaN handling which should be addressed before merge. The other changes are well-thought-out improvements to error handling and metrics reporting.

Verdict: ✅ Approve with minor changes requested

@graphite-app graphite-app bot deleted the gtmq_spec_6358ce_1770418811528-25784604-77b7-4ddc-8ab4-f59b3bc5c6b3 branch February 6, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant